Communicating Research Findings

PSCI 2270 - Week 12

Georgiy Syunyaev

Department of Political Science, Vanderbilt University

November 28, 2023

Plan for this week



  1. Presentations next week

  2. Communicating research findings

  3. Q&A on projects

Presentations next week

Reminders


  • Complete Human Subjects Training provided by CITI

    • You can receive the certificate for that through Vandy for free here (read guide here)
    • You need to complete the basic module and upload it via brightspace (not through OSF)
  • Final presentations are next week

    • Each of you will have 5-7 minutes for presentation with 3-5 minutes of feedback
    • Sign up for your slot here by Thursday
    • There will be pizza and drinks!
  • Q&A last 20 minutes of class on Thursday

Final presentations


  • 5-7 minutes \(\approx\) 5-7 slides
  • Do not put too much text on the slides:

    • \(6 \times 6\) rule: Unless absloutely unavoidable don’t put more than 6 words and 6 lines per slide
    • Try not to read from slides
  • What to include: Motivation, Research Question, Hypotheses, Research Design (Context/Unit of analysis/Experiment or Observational), Independent/Dependent variables, Measurement and procedures

Communicating research findings

The issue with communication


  • There is a lot of information in the world

    • There is even a lot of information in your individual projects… TOO MUCH
  • Your have two tasks

    1. Understand what results you have: Descriptive, observational, experimental
    2. Decide how to present your results: Simplify and tell a story with your data
  • In the end you will have a result from the data AND would want others to understand and believe in it

How we usually communicate

Two main approaches: Tables and Graphs

  • First can be more informative for advanced audience (who understand point estimates and standard errors)
  • Second can be more informative to general audience
  • Sometimes it is not enough to just look at the table! \(\Rightarrow\) Always plot your data!

Anscombe’s Quartet


  • F.J. Anscombe (1973): “…make both calculations and graphs. Both sorts of output should be studied; each will contribute to understanding.”
as_tibble(anscombe)
# A tibble: 11 × 8
      x1    x2    x3    x4    y1    y2    y3    y4
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1    10    10    10     8  8.04  9.14  7.46  6.58
 2     8     8     8     8  6.95  8.14  6.77  5.76
 3    13    13    13     8  7.58  8.74 12.7   7.71
 4     9     9     9     8  8.81  8.77  7.11  8.84
 5    11    11    11     8  8.33  9.26  7.81  8.47
 6    14    14    14     8  9.96  8.1   8.84  7.04
 7     6     6     6     8  7.24  6.13  6.08  5.25
 8     4     4     4    19  4.26  3.1   5.39 12.5 
 9    12    12    12     8 10.8   9.13  8.15  5.56
10     7     7     7     8  4.82  7.26  6.42  7.91
11     5     5     5     8  5.68  4.74  5.73  6.89

Anscombe’s Quartet: Correlations 😎

  • There are four studies. Let’s look at the statistical relationship between \(X\) and \(Y\) for each of them
lm(y1 ~ x1, data = anscombe)

Call:
lm(formula = y1 ~ x1, data = anscombe)

Coefficients:
(Intercept)           x1  
     3.0001       0.5001  
lm(y3 ~ x3, data = anscombe)

Call:
lm(formula = y3 ~ x3, data = anscombe)

Coefficients:
(Intercept)           x3  
     3.0025       0.4997  
lm(y2 ~ x2, data = anscombe)

Call:
lm(formula = y2 ~ x2, data = anscombe)

Coefficients:
(Intercept)           x2  
      3.001        0.500  
lm(y4 ~ x4, data = anscombe)

Call:
lm(formula = y4 ~ x4, data = anscombe)

Coefficients:
(Intercept)           x4  
     3.0017       0.4999  
  • Note: We can also estimate linear correlation by either removing intercept y1 ~ -1 + x1 or by using cor() function

Anscombe’s Quartet: Long format 👌

  • Long format is useful for ggplot2: Stack \(X\)’s and \(Y\)’s for each study on top of each other

    • In tidyverse we also call this long format tidy (which is where the tidy-verse is coming from)
as_tibble(anscombe_tidy)
# A tibble: 44 × 4
   study    id     x     y
   <chr> <int> <dbl> <dbl>
 1 1         1    10  8.04
 2 2         1    10  9.14
 3 3         1    10  7.46
 4 4         1     8  6.58
 5 1         2     8  6.95
 6 2         2     8  8.14
 7 3         2     8  6.77
 8 4         2     8  5.76
 9 1         3    13  7.58
10 2         3    13  8.74
# ℹ 34 more rows
lm(y ~ x, data = anscombe_tidy)

Call:
lm(formula = y ~ x, data = anscombe_tidy)

Coefficients:
(Intercept)            x  
     3.0013       0.4999  

Anscombe’s Quartet: Pooled plot 🙆‍♂️

ggplot(data = anscombe_tidy, 
       mapping = aes(x = x, y = y)) +
  geom_smooth(method = lm, se = FALSE, color = "grey") +
  geom_point() + 
  coord_equal() +
  theme_bw()

Anscombe’s Quartet: Split plot 🤯

ggplot(data = anscombe_tidy, 
       mapping = aes(x = x, y = y, color = study)) +
  geom_smooth(method = lm, se = FALSE, color = "grey") +
  geom_point() +
  coord_equal() +
  facet_wrap(~ study) +
  theme_bw()

Anscombe 2.0 Cairo; Matejka & Fitzmaurice


Pew research

Which of the following describes the pattern on the plot?

A. In recent years, the rate of cavities has increased in many countries

B. In some countries, people brush their teeth more frequently than in other countries

C. The more sugar people eat, the more likely they are to get cavities

D. In recent years, the consumption of sugar has increased in many countries

Graphing types

  1. Position along a common scale: Spatial location along a common baseline to represent data

    • Example: Bar chart where the length of each bar represents a certain value, and each bar starts from the same baseline.
  1. Position along non-aligned scales: Position is still used, but the baselines differ

    • Example: Grouped bar chart where each group has a different baseline
  1. Length, direction, angle: Comparisons based on length are fairly accurate. The direction is slightly less so, and the angle even less.

    • Example: A bar chart (length) is usually more effective than a pie chart (angle)

Graphing types


  1. Area: Area is less accurately perceived than the above

    • Example: Circle size in a bubble chart
  1. Volume and curvature: These are difficult for us to evaluate accurately

    • Example: 3D charts that use volume to represent data
  1. Shading, and colour saturation: These are the least accurately perceived

    • Example: Heatmap

Hierarchy by Munzner (2014)


  • How would you test this?

Experimental evidence (!!)

  • Cleveland and McGill (1984): Foundational experiments
  • Heer and Bostock (2010): Using Amazon MTurk crowd-sourcing sample
  • Davis et al. (2022): Account for differences in respondent characteristics
  • Position > angle ? area ? volume \(\Rightarrow\) Position rules!

Common issues with graphs

  • Not enough information (e.g. improper or missing labels)
  • Hard to compare data (e.g. using 3D shapes)
  • Being intransparent (e.g. about scales)
  • Cutting data (e.g. use binary indicators instead of averages)
  • Too cluttered/Too much data
  • More examples here

Which plots we use

  • Workhorses

    • Histograms

    • Scatterplots

    • Time trends

    • Dot-whisker / Box plots

  • …Ponies
  • …Unicorns
Code
gapminder |> 
  ggplot2::ggplot(
    mapping = aes(x = gdpPercap)) +
  ggplot2::geom_histogram(bins = 30, fill = "lightgrey", color = "black") +
  ggplot2::labs(title = "Histogram of GDP Per Capita") +
  ggplot2::theme_minimal()

Code
gapminder |> 
  ggplot2::ggplot(
    mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) +
  ggplot2::geom_point(alpha = 0.2) +
  ggplot2::stat_smooth(se = FALSE) +
  ggplot2::scale_x_log10() +
  ggplot2::labs(
    title = "Life Expectancy vs GDP Per Capita by Continent",
    x = "log(gdpPercap)") +
  ggplot2::theme_minimal()

Code
gapminder |> 
  dplyr::group_by(continent, year) |> 
  dplyr::summarise(lifeExp = mean(lifeExp, na.rm = TRUE)) |> 
  dplyr::mutate(type = "Continent average") |> 
  dplyr::bind_rows(gapminder) |> 
  dplyr::mutate(type = ifelse(is.na(type), "Country", type),
                country = ifelse(is.na(country), continent, country)) |> 
  ggplot2::ggplot(
    mapping = aes(x = year, y = lifeExp, 
                  group = country, color = type, alpha = type)) +
  ggplot2::geom_line(linewidth = 0.8, show.legend = FALSE) +
  ggplot2::facet_wrap(~continent, ncol = 2) +
  ggplot2::scale_alpha_manual(values = c(1, .1)) +
  ggplot2::labs(
    title = "Trends in Life Expectancy by Continent") + 
  ggplot2::theme_minimal() 

Code
gapminder |> 
  ggplot2::ggplot(
    mapping = aes(x = continent, y = lifeExp)) +
  ggplot2::geom_boxplot(outlier.colour = "hotpink") +
  ggplot2::geom_jitter(position = position_jitter(width = 0.1, height = 0), 
                       alpha = 0.25) +
  ggplot2::labs(title = "Box Plot of Life Expectancy by Continent") +
  ggplot2::theme_minimal()

Resources



Next time



  • Practice with summary tables and workhorses in ggplot2

  • Q&A

References

Cleveland, William S., and Robert McGill. 1984. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association 79 (387): 531–54. https://doi.org/10.1080/01621459.1984.10478080.
Davis, Russell, Xiaoying Pu, Yiren Ding, Brian D. Hall, Karen Bonilla, Mi Feng, Matthew Kay, and Lane Harrison. 2022. “The Risks of Ranking: Revisiting Graphical Perception to Model Individual Differences in Visualization Performance.” IEEE Transactions on Visualization and Computer Graphics, 1–16. https://doi.org/10.1109/TVCG.2022.3226463.
Heer, Jeffrey, and Michael Bostock. 2010. “Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design.” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 203–12.
Munzner, Tamara. 2014. Visualization Analysis and Design. CRC press.